Cluster validity measure and merging system for hierarchical clustering considering outliers

نویسندگان

  • Frank de Morsier
  • Devis Tuia
  • Maurice Borgeaud
  • Volker Gass
  • Jean-Philippe Thiran
چکیده

Clustering algorithms have evolved to handle more and more complex structures. However, the measures that allow to qualify the quality of such clustering partitions are rare and have been developed only for specific algorithms. In this work, we propose a new cluster validity measure (CVM) to quantify the clustering performance of hierarchical algorithms that handle overlapping clusters of any shape and in the presence of outliers. This work also introduces a cluster merging system (CMS) to group clusters that share outliers. When located in regions of cluster overlap, these outliers may be issued by a mixture of nearby cores. The proposed CVM and CMS are applied to hierarchical extensions of the Support Vector and Gaussian Process Clustering algorithms both in synthetic and real experiments. These results show that the proposed metrics help to select the appropriate level of hierarchy and the appropriate

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An a contrario approach to hierarchical clustering validity assessment

In this paper we present a method to detect natural groups in a data set, based on hierarchical clustering. A measure of the meaningfulness of clusters, derived from a background model assuming no class structure in the data, provides a way to compare clusters, and leads to a cluster validity criterion. This criterion is applied to every cluster in the nested structure. While all clusters passi...

متن کامل

A partition-based algorithm for clustering large-scale software systems

Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

A Relative Approach to Hierarchical Clustering

This paper presents a new approach to agglomerative hierarchical clustering. Classical hierarchical clustering algorithms are based on metrics which only consider the absolute distance between two clusters, merging the pair of clusters with highest absolute similarity. We propose a relative dissimilarity measure, which considers not only the distance between a pair of clusters, but also how dis...

متن کامل

Bidirectional Hierarchical Clustering for Web Mining

In this paper we propose a new bidirectional hierarchical clustering system for addressing challenges of web mining. The key feature of our approach is that it aims to maximize the intra-cluster similarity in the bottom-up cluster-merging phase and it ensures to minimize the inter-cluster similarity in the top-down refinement phase. This two-pass approach achieves better clustering than existin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition

دوره 48  شماره 

صفحات  -

تاریخ انتشار 2015